Overview

Dataset statistics

Number of variables12
Number of observations714
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory67.1 KiB
Average record size in memory96.2 B

Variable types

Numeric6
Categorical6

Alerts

Name has a high cardinality: 714 distinct values High cardinality
Ticket has a high cardinality: 542 distinct values High cardinality
df_index is highly correlated with PassengerIdHigh correlation
PassengerId is highly correlated with df_indexHigh correlation
Pclass is highly correlated with FareHigh correlation
Fare is highly correlated with PclassHigh correlation
df_index is highly correlated with PassengerIdHigh correlation
PassengerId is highly correlated with df_indexHigh correlation
Pclass is highly correlated with FareHigh correlation
Fare is highly correlated with PclassHigh correlation
df_index is highly correlated with PassengerIdHigh correlation
PassengerId is highly correlated with df_indexHigh correlation
Pclass is highly correlated with FareHigh correlation
Fare is highly correlated with PclassHigh correlation
Sex is highly correlated with SurvivedHigh correlation
Survived is highly correlated with SexHigh correlation
df_index is highly correlated with PassengerIdHigh correlation
PassengerId is highly correlated with df_indexHigh correlation
Survived is highly correlated with SexHigh correlation
Pclass is highly correlated with Fare and 1 other fieldsHigh correlation
Sex is highly correlated with SurvivedHigh correlation
Fare is highly correlated with PclassHigh correlation
Embark is highly correlated with PclassHigh correlation
df_index is uniformly distributed Uniform
PassengerId is uniformly distributed Uniform
Name is uniformly distributed Uniform
Ticket is uniformly distributed Uniform
df_index has unique values Unique
PassengerId has unique values Unique
Name has unique values Unique
SibSp has 471 (66.0%) zeros Zeros
Parch has 521 (73.0%) zeros Zeros

Reproduction

Analysis started2022-10-09 08:01:16.724608
Analysis finished2022-10-09 08:01:23.340626
Duration6.62 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
UNIFORM
UNIQUE

Distinct714
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean447.5826331
Minimum0
Maximum890
Zeros1
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size5.7 KiB
2022-10-09T13:31:23.394676image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile49.65
Q1221.25
median444
Q3676.75
95-th percentile848.7
Maximum890
Range890
Interquartile range (IQR)455.5

Descriptive statistics

Standard deviation259.1195244
Coefficient of variation (CV)0.578931141
Kurtosis-1.224109035
Mean447.5826331
Median Absolute Deviation (MAD)227.5
Skewness-0.0006094557038
Sum319574
Variance67142.92794
MonotonicityStrictly increasing
2022-10-09T13:31:23.478752image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01
 
0.1%
6211
 
0.1%
5941
 
0.1%
5951
 
0.1%
5971
 
0.1%
5991
 
0.1%
6001
 
0.1%
6031
 
0.1%
6041
 
0.1%
6051
 
0.1%
Other values (704)704
98.6%
ValueCountFrequency (%)
01
0.1%
11
0.1%
21
0.1%
31
0.1%
41
0.1%
61
0.1%
71
0.1%
81
0.1%
91
0.1%
101
0.1%
ValueCountFrequency (%)
8901
0.1%
8891
0.1%
8871
0.1%
8861
0.1%
8851
0.1%
8841
0.1%
8831
0.1%
8821
0.1%
8811
0.1%
8801
0.1%

PassengerId
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
UNIFORM
UNIQUE

Distinct714
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean448.5826331
Minimum1
Maximum891
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.7 KiB
2022-10-09T13:31:23.561828image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile50.65
Q1222.25
median445
Q3677.75
95-th percentile849.7
Maximum891
Range890
Interquartile range (IQR)455.5

Descriptive statistics

Standard deviation259.1195244
Coefficient of variation (CV)0.5776405624
Kurtosis-1.224109035
Mean448.5826331
Median Absolute Deviation (MAD)227.5
Skewness-0.0006094557038
Sum320288
Variance67142.92794
MonotonicityStrictly increasing
2022-10-09T13:31:23.638898image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
11
 
0.1%
6221
 
0.1%
5951
 
0.1%
5961
 
0.1%
5981
 
0.1%
6001
 
0.1%
6011
 
0.1%
6041
 
0.1%
6051
 
0.1%
6061
 
0.1%
Other values (704)704
98.6%
ValueCountFrequency (%)
11
0.1%
21
0.1%
31
0.1%
41
0.1%
51
0.1%
71
0.1%
81
0.1%
91
0.1%
101
0.1%
111
0.1%
ValueCountFrequency (%)
8911
0.1%
8901
0.1%
8881
0.1%
8871
0.1%
8861
0.1%
8851
0.1%
8841
0.1%
8831
0.1%
8821
0.1%
8811
0.1%

Survived
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size5.7 KiB
0
424 
1
290 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters714
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row1
4th row1
5th row0

Common Values

ValueCountFrequency (%)
0424
59.4%
1290
40.6%

Length

2022-10-09T13:31:23.708962image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-09T13:31:23.775022image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
0424
59.4%
1290
40.6%

Most occurring characters

ValueCountFrequency (%)
0424
59.4%
1290
40.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number714
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0424
59.4%
1290
40.6%

Most occurring scripts

ValueCountFrequency (%)
Common714
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0424
59.4%
1290
40.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII714
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0424
59.4%
1290
40.6%

Pclass
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size5.7 KiB
3
355 
1
186 
2
173 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters714
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3
2nd row1
3rd row3
4th row1
5th row3

Common Values

ValueCountFrequency (%)
3355
49.7%
1186
26.1%
2173
24.2%

Length

2022-10-09T13:31:23.833075image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-09T13:31:23.898134image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
3355
49.7%
1186
26.1%
2173
24.2%

Most occurring characters

ValueCountFrequency (%)
3355
49.7%
1186
26.1%
2173
24.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number714
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3355
49.7%
1186
26.1%
2173
24.2%

Most occurring scripts

ValueCountFrequency (%)
Common714
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3355
49.7%
1186
26.1%
2173
24.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII714
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3355
49.7%
1186
26.1%
2173
24.2%

Name
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE

Distinct714
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size5.7 KiB
Braund, Mr. Owen Harris
 
1
Kimball, Mr. Edwin Nelson Jr
 
1
Chapman, Mr. John Henry
 
1
Van Impe, Mr. Jean Baptiste
 
1
Johnson, Mr. Alfred
 
1
Other values (709)
709 

Length

Max length82
Median length52
Mean length27.69327731
Min length13

Characters and Unicode

Total characters19773
Distinct characters59
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique714 ?
Unique (%)100.0%

Sample

1st rowBraund, Mr. Owen Harris
2nd rowCumings, Mrs. John Bradley (Florence Briggs Thayer)
3rd rowHeikkinen, Miss. Laina
4th rowFutrelle, Mrs. Jacques Heath (Lily May Peel)
5th rowAllen, Mr. William Henry

Common Values

ValueCountFrequency (%)
Braund, Mr. Owen Harris1
 
0.1%
Kimball, Mr. Edwin Nelson Jr1
 
0.1%
Chapman, Mr. John Henry1
 
0.1%
Van Impe, Mr. Jean Baptiste1
 
0.1%
Johnson, Mr. Alfred1
 
0.1%
Duff Gordon, Sir. Cosmo Edmund ("Mr Morgan")1
 
0.1%
Jacobsohn, Mrs. Sidney Samuel (Amy Frances Christy)1
 
0.1%
Torber, Mr. Ernst William1
 
0.1%
Homer, Mr. Harry ("Mr E Haven")1
 
0.1%
Lindell, Mr. Edvard Bengtsson1
 
0.1%
Other values (704)704
98.6%

Length

2022-10-09T13:31:23.974203image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
mr402
 
13.5%
miss146
 
4.9%
mrs112
 
3.8%
william55
 
1.9%
john36
 
1.2%
master36
 
1.2%
henry28
 
0.9%
charles19
 
0.6%
james18
 
0.6%
george18
 
0.6%
Other values (1297)2097
70.7%

Most occurring characters

ValueCountFrequency (%)
2255
 
11.4%
r1591
 
8.0%
e1390
 
7.0%
a1375
 
7.0%
i1113
 
5.6%
s1095
 
5.5%
n1090
 
5.5%
l914
 
4.6%
M884
 
4.5%
o818
 
4.1%
Other values (49)7248
36.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter12785
64.7%
Uppercase Letter2972
 
15.0%
Space Separator2255
 
11.4%
Other Punctuation1500
 
7.6%
Open Punctuation125
 
0.6%
Close Punctuation125
 
0.6%
Dash Punctuation11
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r1591
12.4%
e1390
10.9%
a1375
10.8%
i1113
8.7%
s1095
8.6%
n1090
8.5%
l914
 
7.1%
o818
 
6.4%
t557
 
4.4%
d423
 
3.3%
Other values (16)2419
18.9%
Uppercase Letter
ValueCountFrequency (%)
M884
29.7%
A222
 
7.5%
J176
 
5.9%
H169
 
5.7%
E149
 
5.0%
C147
 
4.9%
S145
 
4.9%
W120
 
4.0%
B119
 
4.0%
L110
 
3.7%
Other values (15)731
24.6%
Other Punctuation
ValueCountFrequency (%)
.715
47.7%
,714
47.6%
"70
 
4.7%
/1
 
0.1%
Space Separator
ValueCountFrequency (%)
2255
100.0%
Open Punctuation
ValueCountFrequency (%)
(125
100.0%
Close Punctuation
ValueCountFrequency (%)
)125
100.0%
Dash Punctuation
ValueCountFrequency (%)
-11
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin15757
79.7%
Common4016
 
20.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
r1591
 
10.1%
e1390
 
8.8%
a1375
 
8.7%
i1113
 
7.1%
s1095
 
6.9%
n1090
 
6.9%
l914
 
5.8%
M884
 
5.6%
o818
 
5.2%
t557
 
3.5%
Other values (41)4930
31.3%
Common
ValueCountFrequency (%)
2255
56.2%
.715
 
17.8%
,714
 
17.8%
(125
 
3.1%
)125
 
3.1%
"70
 
1.7%
-11
 
0.3%
/1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII19773
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2255
 
11.4%
r1591
 
8.0%
e1390
 
7.0%
a1375
 
7.0%
i1113
 
5.6%
s1095
 
5.5%
n1090
 
5.5%
l914
 
4.6%
M884
 
4.5%
o818
 
4.1%
Other values (49)7248
36.7%

Sex
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size5.7 KiB
male
453 
female
261 

Length

Max length6
Median length4
Mean length4.731092437
Min length4

Characters and Unicode

Total characters3378
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowmale
2nd rowfemale
3rd rowfemale
4th rowfemale
5th rowmale

Common Values

ValueCountFrequency (%)
male453
63.4%
female261
36.6%

Length

2022-10-09T13:31:24.052274image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-09T13:31:24.116332image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
male453
63.4%
female261
36.6%

Most occurring characters

ValueCountFrequency (%)
e975
28.9%
m714
21.1%
a714
21.1%
l714
21.1%
f261
 
7.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter3378
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e975
28.9%
m714
21.1%
a714
21.1%
l714
21.1%
f261
 
7.7%

Most occurring scripts

ValueCountFrequency (%)
Latin3378
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e975
28.9%
m714
21.1%
a714
21.1%
l714
21.1%
f261
 
7.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII3378
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e975
28.9%
m714
21.1%
a714
21.1%
l714
21.1%
f261
 
7.7%

Age
Real number (ℝ≥0)

Distinct88
Distinct (%)12.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean29.69911765
Minimum0.42
Maximum80
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.7 KiB
2022-10-09T13:31:24.181392image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0.42
5-th percentile4
Q120.125
median28
Q338
95-th percentile56
Maximum80
Range79.58
Interquartile range (IQR)17.875

Descriptive statistics

Standard deviation14.52649733
Coefficient of variation (CV)0.4891221855
Kurtosis0.1782741536
Mean29.69911765
Median Absolute Deviation (MAD)9
Skewness0.3891077823
Sum21205.17
Variance211.0191247
MonotonicityNot monotonic
2022-10-09T13:31:24.261464image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2430
 
4.2%
2227
 
3.8%
1826
 
3.6%
1925
 
3.5%
2825
 
3.5%
3025
 
3.5%
2124
 
3.4%
2523
 
3.2%
3622
 
3.1%
2920
 
2.8%
Other values (78)467
65.4%
ValueCountFrequency (%)
0.421
 
0.1%
0.671
 
0.1%
0.752
 
0.3%
0.832
 
0.3%
0.921
 
0.1%
17
1.0%
210
1.4%
36
0.8%
410
1.4%
54
 
0.6%
ValueCountFrequency (%)
801
 
0.1%
741
 
0.1%
712
0.3%
70.51
 
0.1%
702
0.3%
661
 
0.1%
653
0.4%
642
0.3%
632
0.3%
624
0.6%

SibSp
Real number (ℝ≥0)

ZEROS

Distinct6
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.512605042
Minimum0
Maximum5
Zeros471
Zeros (%)66.0%
Negative0
Negative (%)0.0%
Memory size5.7 KiB
2022-10-09T13:31:24.327524image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile2
Maximum5
Range5
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.9297834541
Coefficient of variation (CV)1.813839853
Kurtosis7.044950785
Mean0.512605042
Median Absolute Deviation (MAD)0
Skewness2.519576762
Sum366
Variance0.8644972716
MonotonicityNot monotonic
2022-10-09T13:31:24.379571image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
0471
66.0%
1183
 
25.6%
225
 
3.5%
418
 
2.5%
312
 
1.7%
55
 
0.7%
ValueCountFrequency (%)
0471
66.0%
1183
 
25.6%
225
 
3.5%
312
 
1.7%
418
 
2.5%
55
 
0.7%
ValueCountFrequency (%)
55
 
0.7%
418
 
2.5%
312
 
1.7%
225
 
3.5%
1183
 
25.6%
0471
66.0%

Parch
Real number (ℝ≥0)

ZEROS

Distinct7
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.431372549
Minimum0
Maximum6
Zeros521
Zeros (%)73.0%
Negative0
Negative (%)0.0%
Memory size5.7 KiB
2022-10-09T13:31:24.431619image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile2
Maximum6
Range6
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.8532893658
Coefficient of variation (CV)1.978079893
Kurtosis8.853125533
Mean0.431372549
Median Absolute Deviation (MAD)0
Skewness2.618913989
Sum308
Variance0.7281027418
MonotonicityNot monotonic
2022-10-09T13:31:24.482665image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0521
73.0%
1110
 
15.4%
268
 
9.5%
55
 
0.7%
35
 
0.7%
44
 
0.6%
61
 
0.1%
ValueCountFrequency (%)
0521
73.0%
1110
 
15.4%
268
 
9.5%
35
 
0.7%
44
 
0.6%
55
 
0.7%
61
 
0.1%
ValueCountFrequency (%)
61
 
0.1%
55
 
0.7%
44
 
0.6%
35
 
0.7%
268
 
9.5%
1110
 
15.4%
0521
73.0%

Ticket
Categorical

HIGH CARDINALITY
UNIFORM

Distinct542
Distinct (%)75.9%
Missing0
Missing (%)0.0%
Memory size5.7 KiB
347082
 
7
3101295
 
6
CA 2144
 
6
347088
 
6
382652
 
5
Other values (537)
684 

Length

Max length18
Median length17
Mean length6.841736695
Min length3

Characters and Unicode

Total characters4885
Distinct characters35
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique431 ?
Unique (%)60.4%

Sample

1st rowA/5 21171
2nd rowPC 17599
3rd rowSTON/O2. 3101282
4th row113803
5th row373450

Common Values

ValueCountFrequency (%)
3470827
 
1.0%
31012956
 
0.8%
CA 21446
 
0.8%
3470886
 
0.8%
3826525
 
0.7%
S.O.C. 148795
 
0.7%
1137604
 
0.6%
16014
 
0.6%
199504
 
0.6%
3470774
 
0.6%
Other values (532)663
92.9%

Length

2022-10-09T13:31:24.547725image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
pc50
 
5.5%
c.a26
 
2.8%
a/515
 
1.6%
212
 
1.3%
ston/o12
 
1.3%
3470827
 
0.8%
sc/paris7
 
0.8%
ca7
 
0.8%
soton/o.q6
 
0.7%
31012956
 
0.7%
Other values (567)768
83.8%

Most occurring characters

ValueCountFrequency (%)
3569
11.6%
1569
11.6%
2472
9.7%
7407
8.3%
4381
 
7.8%
0340
 
7.0%
5324
 
6.6%
6315
 
6.4%
9259
 
5.3%
8234
 
4.8%
Other values (25)1015
20.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number3870
79.2%
Uppercase Letter546
 
11.2%
Other Punctuation247
 
5.1%
Space Separator202
 
4.1%
Lowercase Letter20
 
0.4%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
C127
23.3%
P84
15.4%
O84
15.4%
A65
11.9%
S62
11.4%
N35
 
6.4%
T31
 
5.7%
W13
 
2.4%
Q10
 
1.8%
I9
 
1.6%
Other values (6)26
 
4.8%
Decimal Number
ValueCountFrequency (%)
3569
14.7%
1569
14.7%
2472
12.2%
7407
10.5%
4381
9.8%
0340
8.8%
5324
8.4%
6315
8.1%
9259
6.7%
8234
6.0%
Lowercase Letter
ValueCountFrequency (%)
a5
25.0%
s5
25.0%
r4
20.0%
i4
20.0%
l1
 
5.0%
e1
 
5.0%
Other Punctuation
ValueCountFrequency (%)
.166
67.2%
/81
32.8%
Space Separator
ValueCountFrequency (%)
202
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common4319
88.4%
Latin566
 
11.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
C127
22.4%
P84
14.8%
O84
14.8%
A65
11.5%
S62
11.0%
N35
 
6.2%
T31
 
5.5%
W13
 
2.3%
Q10
 
1.8%
I9
 
1.6%
Other values (12)46
 
8.1%
Common
ValueCountFrequency (%)
3569
13.2%
1569
13.2%
2472
10.9%
7407
9.4%
4381
8.8%
0340
7.9%
5324
7.5%
6315
7.3%
9259
6.0%
8234
5.4%
Other values (3)449
10.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII4885
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3569
11.6%
1569
11.6%
2472
9.7%
7407
8.3%
4381
 
7.8%
0340
 
7.0%
5324
 
6.6%
6315
 
6.4%
9259
 
5.3%
8234
 
4.8%
Other values (25)1015
20.8%

Fare
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct220
Distinct (%)30.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean34.69451401
Minimum0
Maximum512.3292
Zeros7
Zeros (%)1.0%
Negative0
Negative (%)0.0%
Memory size5.7 KiB
2022-10-09T13:31:24.844995image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile7.225
Q18.05
median15.7417
Q333.375
95-th percentile120
Maximum512.3292
Range512.3292
Interquartile range (IQR)25.325

Descriptive statistics

Standard deviation52.9189295
Coefficient of variation (CV)1.52528234
Kurtosis30.92424901
Mean34.69451401
Median Absolute Deviation (MAD)8.2334
Skewness4.653630368
Sum24771.883
Variance2800.4131
MonotonicityNot monotonic
2022-10-09T13:31:24.922065image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1341
 
5.7%
2630
 
4.2%
8.0529
 
4.1%
10.524
 
3.4%
7.895823
 
3.2%
7.92518
 
2.5%
7.7514
 
2.0%
7.77514
 
2.0%
26.5513
 
1.8%
7.854213
 
1.8%
Other values (210)495
69.3%
ValueCountFrequency (%)
07
1.0%
4.01251
 
0.1%
51
 
0.1%
6.23751
 
0.1%
6.43751
 
0.1%
6.451
 
0.1%
6.49582
 
0.3%
6.752
 
0.3%
6.9752
 
0.3%
7.04581
 
0.1%
ValueCountFrequency (%)
512.32923
0.4%
2634
0.6%
262.3752
0.3%
247.52082
0.3%
227.5253
0.4%
211.51
 
0.1%
211.33753
0.4%
164.86672
0.3%
153.46253
0.4%
151.554
0.6%

Embark
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size5.7 KiB
2
556 
0
130 
1
 
28

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters714
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row0
3rd row2
4th row2
5th row2

Common Values

ValueCountFrequency (%)
2556
77.9%
0130
 
18.2%
128
 
3.9%

Length

2022-10-09T13:31:24.990127image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-09T13:31:25.055187image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
2556
77.9%
0130
 
18.2%
128
 
3.9%

Most occurring characters

ValueCountFrequency (%)
2556
77.9%
0130
 
18.2%
128
 
3.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number714
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2556
77.9%
0130
 
18.2%
128
 
3.9%

Most occurring scripts

ValueCountFrequency (%)
Common714
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2556
77.9%
0130
 
18.2%
128
 
3.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII714
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2556
77.9%
0130
 
18.2%
128
 
3.9%

Interactions

2022-10-09T13:31:22.713057image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-09T13:31:20.571108image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-09T13:31:20.967468image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-09T13:31:21.513965image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-09T13:31:21.927341image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-09T13:31:22.314693image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-09T13:31:22.777114image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-09T13:31:20.645175image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-09T13:31:21.033529image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-09T13:31:21.581027image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-09T13:31:21.990399image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-09T13:31:22.381755image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-09T13:31:22.840172image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-09T13:31:20.706231image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-09T13:31:21.095585image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-09T13:31:21.647086image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-09T13:31:22.050454image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-09T13:31:22.444812image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-09T13:31:22.909234image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-09T13:31:20.775293image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-09T13:31:21.323792image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-09T13:31:21.720153image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-09T13:31:22.120517image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-09T13:31:22.514875image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-09T13:31:22.974294image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-09T13:31:20.837350image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-09T13:31:21.387850image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-09T13:31:21.788214image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-09T13:31:22.184576image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-09T13:31:22.579935image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-09T13:31:23.040354image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-09T13:31:20.903410image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-09T13:31:21.450908image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-09T13:31:21.860280image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-09T13:31:22.250636image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-09T13:31:22.646996image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-10-09T13:31:25.107234image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-10-09T13:31:25.199317image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-10-09T13:31:25.292402image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-10-09T13:31:25.377479image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-10-09T13:31:25.452548image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-10-09T13:31:23.144449image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-10-09T13:31:23.278570image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

df_indexPassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareEmbark
00103Braund, Mr. Owen Harrismale22.010A/5 211717.25002
11211Cumings, Mrs. John Bradley (Florence Briggs Thayer)female38.010PC 1759971.28330
22313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.92502
33411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.10002
44503Allen, Mr. William Henrymale35.0003734508.05002
56701McCarthy, Mr. Timothy Jmale54.0001746351.86252
67803Palsson, Master. Gosta Leonardmale2.03134990921.07502
78913Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)female27.00234774211.13332
891012Nasser, Mrs. Nicholas (Adele Achem)female14.01023773630.07080
9101113Sandstrom, Miss. Marguerite Rutfemale4.011PP 954916.70002

Last rows

df_indexPassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareEmbark
70488088112Shelley, Mrs. William (Imanita Parrish Hall)female25.00123043326.00002
70588188203Markun, Mr. Johannmale33.0003492577.89582
70688288303Dahlberg, Miss. Gerda Ulrikafemale22.000755210.51672
70788388402Banfield, Mr. Frederick Jamesmale28.000C.A./SOTON 3406810.50002
70888488503Sutehall, Mr. Henry Jrmale25.000SOTON/OQ 3920767.05002
70988588603Rice, Mrs. William (Margaret Norton)female39.00538265229.12501
71088688702Montvila, Rev. Juozasmale27.00021153613.00002
71188788811Graham, Miss. Margaret Edithfemale19.00011205330.00002
71288989011Behr, Mr. Karl Howellmale26.00011136930.00000
71389089103Dooley, Mr. Patrickmale32.0003703767.75001